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DETAILED ACTION 



Claim Rejections - 35 USC §112 



1 . The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

2. Claims 1-10 are rejected under 35 U.S.C. 112, second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

Claim 1 recites the limitation "said offsets" in the 6 th % There is insufficient 
antecedent basis for this limitation in the claim. 



3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 



4. Claims 1-5, and 7-8 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Kuhn et al. [1], (US Patent No. 6,571,208) in view of Kuhn et al. [2], ("Rapid 
speaker adaptation in eigenvoice space," November, 2000) and Padmanabhan et al., 
("Speaker clustering and transformation for speaker adaptation in speech recognition 
systems", January, 1998). 



Claim Rejections = 35 USC § 103 



Regarding claiml, Kuhn et al. [1] disclose: 
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A method for developing context dependent acoustic models, comprising the 
steps of: 

developing a low-dimensional space (K-space) from training speech data 
obtained from a plurality of training speakers (Fig. 2, 20-26; col. 4, L50-68; col. 5, L.1- 
33); 

representing said speaker dependent component as centroids (Fig.1; col. 5, L.34- 
40) within said low-dimensional space (Fig.2, 28; col. 5, L.34-40); [The centroids are 
speaker-dependent components of the training speech data (col.4, L. 11-14; col. 9, L.12- 
13).] 

representing said speaker independent component (speaker-adjusted acoustic 
data) as linear transformations of said centroids (Fig.2, 30); [First, context-independent 
implies speaker-dependent (col.1, L. 39-40) and context-dependent implies speaker- 
independent (col. 3, L.3-4). Next, the allophone-relevant data (speaker-adjusted data), 
the result of the centroid subtraction process, is context-dependent, speaker- 
independent (col.1, L.21-37; col. 7, L.45-51). Finally, the linear transformation of 
centroids is met by the centroid subtraction process (col.8, L. 14-30). Therefore, it can 
deducted that linear transformation of centroids (speaker-adjusted data) represents the 
speaker-independent (context-dependent) component of the training speech data.] 

representing the training speech data from each of said plurality of training 
speakers as the combination of a speaker dependent component (centroids) and a 
speaker independent component (speaker-adjusted data). This is shown if Fig. 1 and 
Fig.2, 16, 28, 30. [Since the subtraction of the speaker-dependent component 
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(centroids) from the training speech data yields the speaker-independent component 
(speaker-adjusted data), it can be deducted that the training speech data represents a 
combination of a speaker-dependent component and a speaker-independent 
component.] 

Kuhn et al. [1] show that the centroid vector consists of the concatenated 
Gaussian mean vectors (col. 3, L. 37-44) but do not show the maximum likelihood re- 
estimation on said training speech data of at least one of said low-dimensional space, 
said centroids, and said offsets to represent context dependent acoustic model. 

However, Kuhn et al. [2] teach: 

maximum likelihood estimation of centroids (Fig.1, "ONLINE STEPS"; sect. C, 
p.697; sect. D, p.698-699); and 

maximum likelihood re-estimation of centroids (Fig.1, "ONLINE STEPS"; p.697, 
col.1, L.9-10 of 1 st % p.697, col. 2, the <H after equation 8). [The re-estimation process of 
centroids is performed by iterative estimation of centroids.] 

Padmanabhan et al. teach: 

maximum likelihood re-estimation of offset (linear transformed data or adjusted- 
speaker data). See Fig. 1, "Re-estimation of Gaussians"; p. 74, col.1, sect. D. [The linear 
transformation of the speech data corresponds to the context-dependent component 
(offset or adjusted-speaker data).] 

It would have been obvious to a person of ordinary skill in the art at the time the 
invention was made to modify the method for developing context-dependent acoustic 
model of Kuhn et al. [1] to include the maximum likelihood re-estimation on the 
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centroids of Kuhn et al. [2] and the maximum likelihood re-estimation on the offset 
(linear transformed data) of Padmanabhan et al. in order to reduce the number of 
parameters to be estimated for the new speaker (Kuhn et al. [2], p.697, 2 nd ^) and to 
bring the new speaker acoustically closer to the training speaker, i.e. to remove 
speaker-dependent idiosyncrasies (Padmanabhan et al., 2 nd of sect. I, "Introduction") 
and thus providing a faster and more accurate speaker adaptation method in speech 
recognition. 

Regarding claim 2, Kuhn et al. [1] show: 

training speech data (combination of a speaker-independent component and a 
speaker-dependent component) is separated by identifying context dependent data 
(adjusted-speaker data) and using said context dependent data (adjusted-speaker data) 
to identify said speaker independent data (speaker-independent component). See Fig. 
2, 30. [The speaker-adjusted data are speaker-independent components of the training 
speech data.] 

Regarding claim 3, Kuhn et al. [1] show: 

training speech data (combination of a speaker-independent component and a 
speaker-dependent component) is separated by identifying context independent data 
(centroids) and using said context independent data (centroids) to identify said speaker 
dependent data (speaker-dependent component). See Fig.2, 28. [The centroids are 
speaker-dependent components of the training speech data.] 

Regarding claim 4, Kuhn et al. [2] show: 
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maximum likelihood re-estimation step is performed iteratively (Fig. 1 , "ONLINE 
STEPS"; p.697, col.1, L9-10 of 1 st % p.697, col.2, the <H after equation 8). 
Regarding claim 5, Kuhn et al. [2] show: 

linear transformations are effected as an offsets (speaker-adjusted data) from 
said centroids (Fig. 2, 30). The linear transformation of centroids is met by the centroid 
subtraction process (col. 8, L. 14-30) 

Regarding claim 7, Kuhn et al. [1] show: 

linear transformations of said centroids (adjusted-speaker data) are represented 
in tree data structures corresponding to individual sound units (Fig.1, 32; col. 7, L. 52-55). 
Regarding claim 8, Kuhn et al. [1] show: 

offsets (adjusted-speaker data) are represented in tree data structures 
corresponding to individual sound units (Fig.1, 32; col.7, L. 52-55). 

5. Claims 9 and 10 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Kuhn et al. [1], in view of Kuhn et al. [2] and Padmanabhan et al., and further in view of 
Kuhn et al. [3] (US Patent No. 6,141,644). 

Regarding claims 9-10, the modified Kuhn et al. [1] do not show using speaker 
dependent component to perform speaker verification or identification. 

However, Kuhn et al. [3] teach: 

using said speaker dependent component to perform speaker verification (Fig.4, 
44-58, 62, 64; col. 16, L.50-58; col.7, L10-17). 

using said speaker dependent component to perform speaker identification 
(Fig.4, 44-58, 66, 68; col.16, L.50-58; col.7, L10-17). 
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It would have obvious to a person of ordinary skill in the art at the time of the 
invention was made to further modify the method for developing context-dependent 
acoustic model of Kuhn et al. [1], Kuhn et al. [2], and Padmanabhan et al. to include 
speaker verification and identification method of Kuhn et al. [3] in order to perform 
authentication of the users in application such conducting financial transactions over the 
telephone (Kuhn et al. [3], col. 1 , L. 10-25). 



6. Claim 6 would be allowable if rewritten to overcome the rejection(s) under 35 
U.S.C. 112, second paragraph, set forth in this Office action and to include all of the 
limitations of the base claim and any intervening claims. 

Regarding claim 6, search and analysis of references do not show: 
a maximum likelihood re-estimation step that generates a re-estimated low- 
dimensional space, re-estimated centroids and re-estimated offsets; and 

context-dependent acoustic models are constructed using the re-estimated low- 
dimensional space and the re-estimated offsets. 



7. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 
US Patent Documents: 



Allowable Subject Matter 



Conclusion 



A). Gao et al. 



06/2000 



6,073,096 
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B) . Digalakis et al. 11/1999 5,864,810 

C) . Kuhn et al. 12/2001 6,327,565 B1 

D) . Kuhn et al. 01/2002 6,343,267 B1 
Other Publications: 

E) . Hazen et al., "A comparision of novel techniques for instantaneous speaker 
adaptation," Proc. of Eurospeech97, pp.2047-2050. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Tim Lao whose telephone number is 703-305-8955. 
The examiner can normally be reached on M-F, 8:30am-5pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703-305-4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-305-9508. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is 703-305- 
9000. 
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Examiner 
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